183 research outputs found
EC-Conf: An Ultra-fast Diffusion Model for Molecular Conformation Generation with Equivariant Consistency
Despite recent advancement in 3D molecule conformation generation driven by
diffusion models, its high computational cost in iterative diffusion/denoising
process limits its application. In this paper, an equivariant consistency model
(EC-Conf) was proposed as a fast diffusion method for low-energy conformation
generation. In EC-Conf, a modified SE (3)-equivariant transformer model was
directly used to encode the Cartesian molecular conformations and a highly
efficient consistency diffusion process was carried out to generate molecular
conformations. It was demonstrated that, with only one sampling step, it can
already achieve comparable quality to other diffusion-based models running with
thousands denoising steps. Its performance can be further improved with a few
more sampling iterations. The performance of EC-Conf is evaluated on both
GEOM-QM9 and GEOM-Drugs sets. Our results demonstrate that the efficiency of
EC-Conf for learning the distribution of low energy molecular conformation is
at least two magnitudes higher than current SOTA diffusion models and could
potentially become a useful tool for conformation generation and sampling.Comment: 10 pages, 3 figure
Communicative Message Passing for Inductive Relation Reasoning
Relation prediction for knowledge graphs aims at predicting missing
relationships between entities. Despite the importance of inductive relation
prediction, most previous works are limited to a transductive setting and
cannot process previously unseen entities. The recent proposed subgraph-based
relation reasoning models provided alternatives to predict links from the
subgraph structure surrounding a candidate triplet inductively. However, we
observe that these methods often neglect the directed nature of the extracted
subgraph and weaken the role of relation information in the subgraph modeling.
As a result, they fail to effectively handle the asymmetric/anti-symmetric
triplets and produce insufficient embeddings for the target triplets. To this
end, we introduce a \textbf{C}\textbf{o}mmunicative \textbf{M}essage
\textbf{P}assing neural network for \textbf{I}nductive re\textbf{L}ation
r\textbf{E}asoning, \textbf{CoMPILE}, that reasons over local directed subgraph
structures and has a vigorous inductive bias to process entity-independent
semantic relations. In contrast to existing models, CoMPILE strengthens the
message interactions between edges and entitles through a communicative kernel
and enables a sufficient flow of relation information. Moreover, we demonstrate
that CoMPILE can naturally handle asymmetric/anti-symmetric relations without
the need for explosively increasing the number of model parameters by
extracting the directed enclosing subgraphs. Extensive experiments show
substantial performance gains in comparison to state-of-the-art methods on
commonly used benchmark datasets with variant inductive settings.Comment: Accepted by AAAI-202
Prediction and validation of the unexplored RNA-binding protein atlas of the human proteome
Detecting protein-RNA interactions is challenging both experimentally and computationally because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified. Here, a template-based, function-prediction technique SPOT-Seq for RBPs is applied to human proteome and its result is validated by a recent proteomic experimental discovery of 860 mRNA-binding proteins (mRBPs). The coverage (or sensitivity) is 42.6% for 1217 known RBPs annotated in the Gene Ontology and 43.6% for 860 newly discovered human mRBPs. Consistent sensitivity indicates the robust performance of SPOT-Seq for predicting RBPs. More importantly, SPOT-Seq detects 2418 novel RBPs in human proteome, 291 of which were validated by the newly discovered mRBP set. Among 291 validated novel RBPs, 61 are not homologous to any known RBPs. Successful validation of predicted novel RBPs permits us to further analysis of their phenotypic roles in disease pathways. The dataset of 2418 predicted novel RBPs along with confidence levels and complex structures is available at http://sparks-lab.org (in publications) for experimental confirmations and hypothesis generation
Template-Based Structure Prediction and Classification of Transcription Factors in \u3ci\u3eArabidopsis thaliana\u3c/i\u3e
Transcription factors (TFs) play important roles in plants. However, there is no systematic study of their structures and functions of most TFs in plants. Here, we performed template-based structure prediction for all TFs in Arabidopsis thaliana, with their full-length sequences as well as C-terminal and N-terminal regions. A total of 2,918 model structures were obtained with a high confidence score. We find that TF families employ only a smaller number of templates for DNA-binding domains (DBD) but a diverse number of templates for transcription regulatory domains (TRD). Although TF families are classified according to DBD, their sizes have a significant correlation with the number of unique non-DNA-binding templates employed in the family (Pearson correlation coefficient of 0.74). That is, the size of TF family is related to its functional diversity. Network analysis reveals new connections between TF families based on shared TRD or DBD templates; 81% TF families share DBD and 67% share TRD templates. Two large fully connected family clusters in this network are observed along with 69 island families. In addition, 25 genes with unknown functions are found to be DNA-binding and/or TF factors according to predicted structures. This work provides a global view of the classification of TFs based on their DBD or TRD templates, and hence, a deeper understanding of DNA-binding and regulatory functions from structural perspective. All structural models of TFs are deposited in the online database for public usage at http://sysbio.unl.edu/AthTF
Template-Based Structure Prediction and Classification of Transcription Factors in \u3ci\u3eArabidopsis thaliana\u3c/i\u3e
Transcription factors (TFs) play important roles in plants. However, there is no systematic study of their structures and functions of most TFs in plants. Here, we performed template-based structure prediction for all TFs in Arabidopsis thaliana, with their full-length sequences as well as C-terminal and N-terminal regions. A total of 2,918 model structures were obtained with a high confidence score. We find that TF families employ only a smaller number of templates for DNA-binding domains (DBD) but a diverse number of templates for transcription regulatory domains (TRD). Although TF families are classified according to DBD, their sizes have a significant correlation with the number of unique non-DNA-binding templates employed in the family (Pearson correlation coefficient of 0.74). That is, the size of TF family is related to its functional diversity. Network analysis reveals new connections between TF families based on shared TRD or DBD templates; 81% TF families share DBD and 67% share TRD templates. Two large fully connected family clusters in this network are observed along with 69 island families. In addition, 25 genes with unknown functions are found to be DNA-binding and/or TF factors according to predicted structures. This work provides a global view of the classification of TFs based on their DBD or TRD templates, and hence, a deeper understanding of DNA-binding and regulatory functions from structural perspective. All structural models of TFs are deposited in the online database for public usage at http://sysbio.unl.edu/AthTF
- …